open world
Learning to Augment Distributions for Out-of-distribution Detection
Open-world classification systems should discern out-of-distribution (OOD) data whose labels deviate from those of in-distribution (ID) cases, motivating recent studies in OOD detection. Advanced works, despite their promising progress, may still fail in the open world, owing to the lacking knowledge about unseen OOD data in advance. Although one can access auxiliary OOD data (distinct from unseen ones) for model training, it remains to analyze how such auxiliary data will work in the open world. To this end, we delve into such a problem from a learning theory perspective, finding that the distribution discrepancy between the auxiliary and the unseen real OOD data is the key to affect the open-world detection performance. Accordingly, we propose Distributional-Augmented OOD Learning (DAOL), alleviating the OOD distribution discrepancy by crafting an OOD distribution set that contains all distributions in a Wasserstein ball centered on the auxiliary OOD distribution. We justify that the predictor trained over the worst OOD data in the ball can shrink the OOD distribution discrepancy, thus improving the open-world detection performance given only the auxiliary OOD data. We conduct extensive evaluations across representative OOD detection setups, demonstrating the superiority of our DAOL over its advanced counterparts.
Requirements for Recognition and Rapid Response to Unfamiliar Events Outside of Agent Design Scope
Wray, Robert E., Jones, Steven J., Laird, John E.
Regardless of past learning, an agent in an open world will face unfamiliar events outside of prior experience, existing models, or policies. Further, the agent will sometimes lack relevant knowledge and/or sufficient time to assess the situation and evaluate response options. How can an agent respond reasonably to situations that are outside of its original design scope? How can it recognize such situations sufficiently quickly and reliably to determine reasonable, adaptive courses of action? We identify key characteristics needed for solutions, review the state-of-the-art, and outline a proposed, novel approach that combines domain-general meta-knowledge (inspired by human cognition) and metareason-ing. This approach offers potential for fast, adaptive responses to unfamiliar situations, more fully meeting the performance characteristics required for open-world, general agents.
- North America > United States > New York (0.05)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > New Jersey > Bergen County > Mahwah (0.04)
- (8 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.70)
MindsEye review – a dystopian future that plays like it's from 2012
It's pretty much a straight copy of the original: a huge soap bubble, half sunk into the desert floor, with its surface turned into a gigantic TV. Occasionally you'll pull up near the Sphere while driving an electric vehicle made by Silva, the megacorp that controls this world. You'll sometimes come to a stop just as an advert for an identical Silva EV plays out on the huge curved screen overhead. The doubling effect can be slightly vertigo-inducing. At these moments, I truly get what MindsEye is trying to do.
- Transportation > Ground > Road (0.71)
- Transportation > Electric Vehicle (0.56)
OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad
Tang, Luyao, Yuan, Yuxuan, Chen, Chaoqi, Zhang, Zeyu, Huang, Yue, Zhang, Kun
Although foundation models (FMs) claim to be powerful, their generalization ability significantly decreases when faced with distribution shifts, weak supervision, or malicious attacks in the open world. On the other hand, most domain generalization or adversarial fine-tuning methods are task-related or model-specific, ignoring the universality in practical applications and the transferability between FMs. This paper delves into the problem of generalizing FMs to the out-of-domain data. We propose a novel framework, the Object-Concept-Relation Triad (OCRT), that enables FMs to extract sparse, high-level concepts and intricate relational structures from raw visual inputs. The key idea is to bind objects in visual scenes and a set of object-centric representations through unsupervised decoupling and iterative refinement. To be specific, we project the object-centric representations onto a semantic concept space that the model can readily interpret and estimate their importance to filter out irrelevant elements. Then, a concept-based graph, which has a flexible degree, is constructed to incorporate the set of concepts and their corresponding importance, enabling the extraction of high-order factors from informative concepts and facilitating relational reasoning among these concepts. Extensive experiments demonstrate that OCRT can substantially boost the generalizability and robustness of SAM and CLIP across multiple downstream tasks.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Fujian Province > Xiamen (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area (0.68)
- Information Technology > Security & Privacy (0.48)
Learning to Augment Distributions for Out-of-distribution Detection
Open-world classification systems should discern out-of-distribution (OOD) data whose labels deviate from those of in-distribution (ID) cases, motivating recent studies in OOD detection. Advanced works, despite their promising progress, may still fail in the open world, owing to the lacking knowledge about unseen OOD data in advance. Although one can access auxiliary OOD data (distinct from unseen ones) for model training, it remains to analyze how such auxiliary data will work in the open world. To this end, we delve into such a problem from a learning theory perspective, finding that the distribution discrepancy between the auxiliary and the unseen real OOD data is the key to affect the open-world detection performance. Accordingly, we propose Distributional-Augmented OOD Learning (DAOL), alleviating the OOD distribution discrepancy by crafting an OOD distribution set that contains all distributions in a Wasserstein ball centered on the auxiliary OOD distribution.
From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
Li, Zizhao, Xiang, Zhengkang, West, Joseph, Khoshelham, Kourosh
Traditional object detection methods operate under the closed-set assumption, where models can only detect a fixed number of objects predefined in the training set. Recent works on open vocabulary object detection (OVD) enable the detection of objects defined by an unbounded vocabulary, which reduces the cost of training models for specific tasks. However, OVD heavily relies on accurate prompts provided by an ''oracle'', which limits their use in critical applications such as driving scene perception. OVD models tend to misclassify near-out-of-distribution (NOOD) objects that have similar semantics to known classes, and ignore far-out-of-distribution (FOOD) objects. To address theses limitations, we propose a framework that enables OVD models to operate in open world settings, by identifying and incrementally learning novel objects. To detect FOOD objects, we propose Open World Embedding Learning (OWEL) and introduce the concept of Pseudo Unknown Embedding which infers the location of unknown classes in a continuous semantic space based on the information of known classes. We also propose Multi-Scale Contrastive Anchor Learning (MSCAL), which enables the identification of misclassified unknown objects by promoting the intra-class consistency of object embeddings at different scales. The proposed method achieves state-of-the-art performance in common open world object detection and autonomous driving benchmarks.
- North America > United States (0.04)
- Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
- Information Technology (0.48)
- Transportation > Ground > Road (0.34)
Open-World Visual Reasoning by a Neuro-Symbolic Program of Zero-Shot Symbols
Burghouts, Gertjan, Hillerström, Fieke, Walraven, Erwin, van Bekkum, Michael, Ruis, Frank, Sijs, Joris, van Mil, Jelle, Dijk, Judith
We consider the problem of finding spatial configurations of multiple objects in images, e.g., a mobile inspection robot is tasked to localize abandoned tools on the floor. We define the spatial configuration of objects by first-order logic in terms of relations and attributes. A neuro-symbolic program matches the logic formulas to probabilistic object proposals for the given image, provided by language-vision models by querying them for the symbols. This work is the first to combine neuro-symbolic programming (reasoning) and language-vision models (learning) to find spatial configurations of objects in images in an open world setting. We show the effectiveness by finding abandoned tools on floors and leaking pipes. We find that most prediction errors are due to biases in the language-vision model.
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.92)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.35)
BOWLL: A Deceptively Simple Open World Lifelong Learner
Kamath, Roshni, Mitchell, Rupert, Paul, Subarnaduti, Kersting, Kristian, Mundt, Martin
The quest to improve scalar performance numbers on predetermined benchmarks seems to be deeply engraved in deep learning. However, the real world is seldom carefully curated and applications are seldom limited to excelling on test sets. A practical system is generally required to recognize novel concepts, refrain from actively including uninformative data, and retain previously acquired knowledge throughout its lifetime. Despite these key elements being rigorously researched individually, the study of their conjunction, open world lifelong learning, is only a recent trend. To accelerate this multifaceted field's exploration, we introduce its first monolithic and much-needed baseline. Leveraging the ubiquitous use of batch normalization across deep neural networks, we propose a deceptively simple yet highly effective way to repurpose standard models for open world lifelong learning. Through extensive empirical evaluation, we highlight why our approach should serve as a future standard for models that are able to effectively maintain their knowledge, selectively focus on informative data, and accelerate future learning.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- (2 more...)
- Research Report (1.00)
- Overview (0.92)
Open World Object Detection in the Era of Foundation Models
Zohar, Orr, Lozano, Alejandro, Goel, Shelly, Yeung, Serena, Wang, Kuan-Chieh
Object detection is integral to a bevy of real-world applications, from robotics to medical image analysis. To be used reliably in such applications, models must be capable of handling unexpected - or novel - objects. The open world object detection (OWD) paradigm addresses this challenge by enabling models to detect unknown objects and learn discovered ones incrementally. However, OWD method development is hindered due to the stringent benchmark and task definitions. These definitions effectively prohibit foundation models. Here, we aim to relax these definitions and investigate the utilization of pre-trained foundation models in OWD. First, we show that existing benchmarks are insufficient in evaluating methods that utilize foundation models, as even naive integration methods nearly saturate these benchmarks. This result motivated us to curate a new and challenging benchmark for these models. Therefore, we introduce a new benchmark that includes five real-world application-driven datasets, including challenging domains such as aerial and surgical images, and establish baselines. We exploit the inherent connection between classes in application-driven datasets and introduce a novel method, Foundation Object detection Model for the Open world, or FOMO, which identifies unknown objects based on their shared attributes with the base known objects. FOMO has ~3x unknown object mAP compared to baselines on our benchmark. However, our results indicate a significant place for improvement - suggesting a great research opportunity in further scaling object detection methods to real-world domains. Our code and benchmark are available at https://orrzohar.github.io/projects/fomo/.
- North America > Canada > Alberta > Census Division No. 13 > Lac Ste. Anne County (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)
- Health & Medicine > Diagnostic Medicine > Imaging (0.48)
- Leisure & Entertainment > Sports (0.46)
- Health & Medicine > Therapeutic Area (0.46)
Little monsters: why indie developers make the best horror games
Leaf through the history of independent video games and the pages are drenched in horror. It was there in the 1990s shareware era of Doom and Hugo's House of Horrors. It was there too in the Flash games of the early 2000s: Exmortis, the House series, the now lost Hotel 626. And it is here now, in the modern indie age. Lone coders and small development studios have always explored dark stories in haunted houses, lonely forests and seemingly abandoned spacecraft populated by demonic entities.